The structure of the compiler may be broadly characterized by describing the
compilation phases and the data structures that they manipulate. The steps in
the compilation are called phases rather than passes since they don't
necessarily involve a full pass over the code. The data structure used to
represent the code at some point is called an intermediate
representation.
Two major intermediate representations are used in the compiler:
- The Implicit Continuation Representation (ICR) represents the lisp-level
semantics of the source code during the initial phases. Partial evaluation and
semantic analysis are done on this representation. ICR is roughly equivalent
to a subset of Common Lisp, but is represented as a flow-graph rather than a
syntax tree. Phases which only manipulate ICR comprise the "front end". It
would be possible to use a different back end such as one that directly
generated code for a stack machine.
- The Virtual Machine Representation (VMR) represents the implementation of
the source code on a virtual machine. The virtual machine may vary depending
on the the target hardware, but VMR is sufficiently stylized that most of the
phases which manipulate it are portable.
Each phase is briefly described here. The phases from ``local call analysis''
to ``constraint propagation'' all interact; for maximum optimization, they
are generally repeated until nothing new is discovered. The source files which
primarily contain each phase are listed after ``Files: ''.
- ICR conversion
- Convert the source into ICR, doing macroexpansion and simple source-to-source
transformation. All names are resolved at this time, so we don't have to worry
about name conflicts later on. Files: ir1tran, srctran, typetran
- Local call analysis
- Find calls to local functions and convert them to
local calls to the correct entry point, doing keyword parsing, etc. Recognize
once-called functions as lets. Create external entry points for
entry-point functions. Files: locall
- Find components
- Find flow graph components and compute depth-first ordering. Separate
top-level code from run-time code, and determine which components are top-level
components. Files: dfo
- ICR optimize
- A grab-bag of all the non-flow ICR optimizations. Fold
constant functions, propagate types and eliminate code that computes unused
values. Special-case calls to some known global functions by replacing them
with a computed function. Merge blocks and eliminate IF-IFs. Substitute let
variables. Files: ir1opt, ir1tran, typetran, seqtran, vm/vm-tran
- Type constraint propagation
- Use global flow analysis to propagate information about lexical variable
types. Eliminate unnecessary type checks and tests. Files: constraint
- Type check generation
- Emit explicit ICR code for any necessary type checks that are too complex to be
easily generated on the fly by the back end. Files: checkgen
- Event driven operations
- Various parts of ICR are incrementally recomputed, either eagerly on
modification of the ICR, or lazily, when the relevant information is needed.
- Check that type assertions are satisfied, marking places where type
checks need to be done.
- Locate let calls.
- Delete functions and variables with no references
Files: ir1util, ir1opt
- ICR finalize
- This phase is run after all components have been compiled. It scans the
global variable references, looking for references to undefined variables
and incompatible function redefinitions. Files: ir1final, main.
- Environment analysis
- Determine which distinct environments need to be allocated, and what
context needed to be closed over by each environment. We detect non-local
exits and set closure variables. We also emit cleanup code as funny
function calls. This is the last pure ICR pass. Files: envanal
- Global TN allocation (GTN)
- Iterate over all defined functions, determining calling conventions
and assigning TNs to local variables. Files: gtn
- Local TN allocation (LTN)
- Use type and policy information to determine which VMR translation to use
for known functions, and then create TNs for expression evaluation
temporaries. We also accumulate some random information needed by VMR
conversion. Files: ltn
- Control analysis
- Linearize the flow graph in a way that minimizes the number of branches. The
block-level structure of the flow graph is basically frozen at this point.
Files: control
- Stack analysis
- Maintain stack discipline for unknown-values continuation in the presence
of local exits. Files: stack
- Entry analysis
- Collect some back-end information for each externally callable function.
- VMR conversion
- Convert ICR into VMR by translating nodes into VOPs.
Emit type checks. Files: ir2tran, vmdef
- Copy propagation
- Use flow analysis to eliminate unnecessary copying of
TN values. Files: copyprop
- Representation selection
- Look at all references to each TN to determine which representation has the
lowest cost. Emit appropriate move and coerce VOPS for that representation.
- Lifetime analysis
- Do flow analysis to find the set of TNs whose lifetimes
overlap with the lifetimes of each TN being packed. Annotate call VOPs with
the TNs that need to be saved. Files: life
- Pack
- Find a legal register allocation, attempting to minimize unnecessary moves.
Files: pack
- Code generation
- Call the VOP generators to emit assembly code. Files: codegen
- Pipeline reorganization
- On some machines, move memory references
backward in the code so that they can overlap with computation. On machines
with delayed branch instructions, locate instructions that can be moved into
delay slots. Files: assem-opt
- Assembly
- Resolve branches and convert in to object code and fixup information.
Files: assembler
- Dumping
- Convert the compiled code into an object file or in-core
function. Files: debug-dump, dump, vm/core